Online Manifold Regularization: A New Learning Setting and Empirical Study
نویسندگان
چکیده
We consider a novel “online semi-supervised learning” setting where (mostly unlabeled) data arrives sequentially in large volume, and it is impractical to store it all before learning. We propose an online manifold regularization algorithm. It differs from standard online learning in that it learns even when the input point is unlabeled. Our algorithm is based on convex programming in kernel space with stochastic gradient descent, and inherits the theoretical guarantees of standard online algorithms. However, näıve implementation of our algorithm does not scale well. This paper focuses on efficient, practical approximations; we discuss two sparse approximations using buffering and online random projection trees. Experiments show our algorithm achieves risk and generalization accuracy comparable to standard batch manifold regularization, while each step runs quickly. Our online semi-supervised learning setting is an interesting direction for further theoretical development, paving the way for semi-supervised learning to work on real-world lifelong learning tasks.
منابع مشابه
Lecture 6: Manifold Regularization
We first analyze the limits of learning in high dimension. Hence, we stress the difference between high dimensional ambient space and intrinsic geometry associated to the marginal distribution. We observe that, in the semi-supervised setting, unlabeled data could be used to exploit low dimensionality of the intrinsic geometry. In order to formalize these intuitions we briefly introduce the mani...
متن کاملStochastic Convex Optimization
For supervised classification problems, it is well known that learnability is equivalent to uniform convergence of the empirical risks and thus to learnability by empirical minimization. Inspired by recent regret bounds for online convex optimization, we study stochastic convex optimization, and uncover a surprisingly different situation in the more general setting: although the stochastic conv...
متن کاملParameter-Free Spectral Kernel Learning
Due to the growing ubiquity of unlabeled data, learning with unlabeled data is attracting increasing attention in machine learning. In this paper, we propose a novel semi-supervised kernel learning method which can seamlessly combine manifold structure of unlabeled data and Regularized Least-Squares (RLS) to learn a new kernel. Interestingly, the new kernel matrix can be obtained analytically w...
متن کاملMulti-view Laplacian Support Vector Machines
We propose a new approach, multi-view Laplacian support vector machines (SVMs), for semi-supervised learning under the multiview scenario. It integrates manifold regularization and multi-view regularization into the usual formulation of SVMs and is a natural extension of SVMs from supervised learning to multi-view semi-supervised learning. The function optimization problem in a reproducing kern...
متن کاملReLISH: Reliable Label Inference via Smoothness Hypothesis
The smoothness hypothesis is critical for graph-based semi-supervised learning. This paper defines local smoothness, based on which a new algorithm, Reliable Label Inference via Smoothness Hypothesis (ReLISH), is proposed. ReLISH has produced smoother labels than some existing methods for both labeled and unlabeled examples. Theoretical analyses demonstrate good stability and generalizability o...
متن کامل